Applied Data Science Capstone by IBM/Coursera
In this project we will try to find an optimal location for a Yoga Center. This report is essentially targetted for stake holders interested in starting a new Yoga Center in Bangalore ,India
Bangalore also known as the Silicon Valley of India is home to millions of Indians from all over india, working in various IT companies. A booming cosmopolitan population has meant huge growth potential in tertiary service industry. Natuarally in recent times franchise model of businesses have taken hold in many parts of Bangalore. Same holds true for Health and Fitness sector as well.
With more and more people becoming fitness aware, has meant the cut throat competition . Thus Businesses have to look into niche offerings like Yoga , Zumba etc. A dedicated Yoga Center is quite a lucrative offering since it addresses mind and body fitness. Also the target audience larger, since Yoga can be taken up by people of all age groups compared to a regular Gym or Zumba center which may be only target population between 20-40.
While a Gym may not be in direct competition of a Yoga center,we will try to find locations which does not have any fitness center in the vicinity. Also we will seek out locations which are in close vicinity of residential, offices, colleges and commercial centers.
We will utilize tools and techniques in data science to generate a few most favorable areas based on the above criteria.
Based on definition of our problem, factors that will influence our decision are
Bangalore has Several suburbs and stellite locations which may not be ideal 0 to start new business. So first we will determine the effective subset of Bangalore for our analysis.
Then we will define closely packed grids of radius around 500 metres for our analysis
We will use the following data sources to generate the required information:
!pip install opencage
!pip install shapely
!pip install pyproj
!pip install folium
import pandas as pd
import numpy as np
import folium
import pyproj
import math
import pickle
from sklearn.cluster import KMeans
import requests
from pandas.io.json import json_normalize
from IPython.display import HTML
import base64
def create_download_link( df, title = "Download CSV file", filename = "data.csv"):
csv = df.to_csv()
b64 = base64.b64encode(csv.encode())
payload = b64.decode()
html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
html = html.format(payload=payload,title=title,filename=filename)
return HTML(html)
web_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore',header=0)
frames = []
for i in range(0,8):
frames.append(web_data[i])
bangalore = pd.concat(frames)
bangalore.drop(['Image','Summary'],axis=1,inplace = True)
bangalore.reset_index(drop=True,inplace=True)
bangalore["Latitude"] = np.zeros(len(bangalore),dtype = float)
bangalore["Longitude"] = np.zeros(len(bangalore),dtype = float)
bangalore.head()
# determine the Spatial coordinates of Bangalore
from opencage.geocoder import OpenCageGeocode
from pprint import pprint
import numpy as np
blr_lat = 0.0
blr_long = 0.0
geocoder = OpenCageGeocode(GEO_KEY)
query = u'Bangalore,Karnataka,India'
results = geocoder.geocode(query)
blr_lat = results[0]['geometry']['lat']
blr_long = results[0]['geometry']['lng']
#print(results[0]['geometry']['lat'],results[0]['geometry']['lng'])
print("Coordinates of Bangalore ",blr_lat,blr_long)
blr_center = [blr_lat,blr_long]
for i in bangalore.index:
query = u''+str(bangalore['Name'][i])+' Bangalore,Karnataka,India'
results = geocoder.geocode(query)
bangalore.at[i,'Latitude'] = results[0]['geometry']['lat']
bangalore.at[i,'Longitude']= results[0]['geometry']['lng']
bangalore.head()
bangalore.head()
blr_lat = np.mean(bangalore['Latitude'])
blr_long = np.mean(bangalore['Longitude'])
blr_center = [blr_lat,blr_long]
print ('Total number of Localities in Bangalore',len(bangalore))
print('Bangalore center longitude={}, latitude={}'.format(blr_center[1], blr_center[0]))
Now we proceed to create equally spaced circular grid with centeres 1 KM apart. In effect each circular grid of radius 500m.
We will need to transform the spatial coordinates into Spherical cartesan coordinates, which will allow to calculate distances in meters
To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters and not in degrees
So we will create transform functions convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).
# Transforamtion Routines : To be used to convert Spatial Coordinates to Catersian Coordinates and Vice-versa
def lonlat_to_xy(lon,lat):
proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
xy = pyproj.transform(proj_latlon, proj_xy,lon,lat)
return xy[0], xy[1]
def xy_to_lonlat(x, y):
proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
return lonlat[0],lonlat[1]
def calc_xy_distance(x1, y1, x2, y2):
dx = x2 - x1
dy = y2 - y1
return math.sqrt(dx*dx + dy*dy)
print('Coordinate transformation check')
print('-------------------------------')
print('Bangalore center longitude={}, latitude={}'.format(blr_center[1], blr_center[0]))
x, y = lonlat_to_xy(blr_center[1], blr_center[0])
print('Bangalore center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Bangalore center longitude={}, latitude={}'.format(lo, la))
The below steps are used to identify effective radius of Bangalore to be used for analysis.
# Compute distance of the farthest Neighbourhood from the center
blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1], blr_center[0])
Dmax = 0.0
dist = np.zeros(len(bangalore),dtype =float)
for i in bangalore.index:
n_x,n_y = lonlat_to_xy(bangalore['Longitude'][i],bangalore['Latitude'][i])
d = calc_xy_distance(blr_center_x, blr_center_y, n_x, n_y)
dist[i] = d
if d > Dmax:
Dmax=d
max_dist = int(Dmax)
avg_dist = int(np.mean(dist))
median_dist = int(np.median(dist))
print(avg_dist)
print(median_dist)
print(max_dist)
bangalore['Distance from Center'] = dist
# We map the suburbs and see which radius best brings in maximum number of suburbs
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Circle(blr_center,radius = avg_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = median_dist,color ='red', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = max_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
map_bangalore
We observe that a minimum radius of 7.7 Km will still keep lot of suburbs out of recknoning . So as a trial we take a mean of the three distances and see if the computed result is numerically effective
# Seeing above we see while taking max distance brings of lot of areas under Bangalore Rural into our analysis .
# Lets compute an average of the mean median and max to come to approximate radius that can be used to limit the area of our analysis
blr_radius = round((median_dist+max_dist+avg_dist)/3,-3)
print(blr_radius)
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Circle(blr_center,radius = avg_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = median_dist,color ='red', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = max_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = blr_radius,color ='green', popup='Bangalore').add_to(map_bangalore)
for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
map_bangalore
So will now limit our analysis within an area of roughly 1017 Square Kms
Now we proceed to create circular cell/grid spaced 1 KM apart .
# The below algorithm divides the circular area of bangalore into uniform circular grids of 500m radius.
# Each circle will be the smalleslt unit that will be analysed for Gym,Yoga Centers
blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1], blr_center[0]) # City center in Cartesian coordinates
x_min = blr_center_x - blr_radius
locality_radius = 500
x_step = 2*locality_radius
y_min = blr_center_y - blr_radius
y_step = 2 * locality_radius
x_max = blr_center_x + blr_radius
y_max = blr_center_x + blr_radius
n_max = int((2*blr_radius/x_step))
print(y_min,x_min)
latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0,n_max+1):
y = y_min + i*y_step
for j in range(0,n_max+1):
x = x_min + j*x_step
distance_from_center = calc_xy_distance(blr_center_x, blr_center_y, x, y)
if (distance_from_center < 18001):
lon, lat = xy_to_lonlat(x, y)
latitudes.append(lat)
longitudes.append(lon)
distances_from_center.append(distance_from_center)
xs.append(x)
ys.append(y)
print(len(latitudes), 'candidate neighborhood centers generated.')
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = blr_radius,color ='red', popup='Bangalore').add_to(map_bangalore)
#for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
for lat, lon in zip(latitudes, longitudes):
#folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
folium.Circle([lat, lon],radius=locality_radius,color='blue', fill=False).add_to(map_bangalore)
#folium.Marker([lat, lon]).add_to(map_bangalore)
map_bangalore
The cells cover effectively the cover the area of Bangalore under study
Now we use the OpenCage API for reverse Geocoding to get effectives addresses of each Cell
from opencage.geocoder import InvalidInputError, RateLimitExceededError, UnknownError
def get_locality_address(api_key,lat,long):
key = api_key
try:
results = geocoder.reverse_geocode(lat,long)
if results and len(results):
return (results[0]['formatted'])
except RateLimitExceededError as ex:
return None
addr = get_locality_address(GEO_KEY,blr_center[0],blr_center[1])
print(addr)
bangalore_localities = []
for lat, lon in zip(latitudes, longitudes):
local_addr = get_locality_address(GEO_KEY, lat, lon)
if local_addr is None:
local_addr = 'NO ADDRESS'
local_addr = local_addr.replace(', India', '') # We don't need country part of address
bangalore_localities.append(local_addr)
bangalore_localities[100:110]
# Now we create a composite dataframe having all the localities of Bangalore with addresses, coordinates, distance from center
blr_locations = pd.DataFrame({'Address': bangalore_localities,
'Latitude': latitudes,
'Longitude': longitudes,
'X': xs,
'Y': ys,
'Distance from center': distances_from_center})
blr_locations.head()
Foursquare
We use the Foursquare API to get information on GYMS/Fitness centers in each location. We will also determine the Yoga Centers in each location. Besides in general we shall fetch the number of Offices, Residential Locations, Colleges,Commercial Activity in each cell as well in entire Bangalore.Collectively these will be referred to as amenities. This will help us in accessing a cells profile.
def get_category_type(row):
try:
categories_list = row['categories']
except:
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
def get_local_feature(lat,lon,category,rradius):
url =('https://api.foursquare.com/v2/venues/explore?client_id={}'
'&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}').format(CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lon,
category,
rradius,
LIMIT)
try:
results = requests.get(url).json()['response']['groups'][0]['items']
nearby_venues = json_normalize(results)
# Filter the columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.distance']
nearby_venues = nearby_venues.loc[:, filtered_columns]
# Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)
# Clean all column names
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
# print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
except:
nearby_venues = []
return nearby_venues
# Getting the Distribution of Offices and Residences in Bangalore
def get_location_profile(lat,lon,radius,limit,categories):
profile = {}
for category in categories:
url =('https://api.foursquare.com/v2/venues/explore?client_id={}'
'&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}').format(CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lon,
category,
radius,
limit)
try:
results = requests.get(url).json()['response']['groups'][0]['items']
results = json_normalize(results)
# Filter the columns
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.distance']
results = results.loc[:, filtered_columns]
# Filter the category for each row
results['venue.categories'] = results.apply(get_category_type, axis = 1)
# Clean all column names
results.columns = [col.split(".")[-1] for col in results.columns]
profile[category] = results
except:
profile[category] = []
return profile
# print('{} offices were returned by Foursquare.'.format(results.shape[0]))
categories = [OFFICE,HOME,COLLEGE,COMMERCE]
blr_profile = get_location_profile(blr_center[0],blr_center[1],18000,1000,categories)
pr_xs = []
pr_ys = []
for category in categories:
for i in blr_profile[category].index:
x,y = lonlat_to_xy(blr_profile[category]['lng'][i],blr_profile[category]['lat'][i])
pr_xs.append(x)
pr_ys.append(y)
blr_profile[category]['xs'] = pr_xs
blr_profile[category]['ys'] = pr_ys
pr_xs = []
pr_ys = []
blr_profile[OFFICE].head()
print ('Number of Offices :',len(blr_profile[OFFICE]))
print('Number of Residential locations :',len(blr_profile[HOME]))
print('Number of Centers with Commercial activities:',len(blr_profile[COMMERCE]))
print('Number of Colleges',len(blr_profile[COLLEGE]))
# Lets go over all the localities and compose a table
def compute_feature(lats,lons):
gyms = {}
yoga_centers = {}
residences = {}
ofcs = {}
location_gyms = []
location_yoga = []
location_profile = []
ofc_cnt = 0
cats = [OFFICE,HOME,COMMERCE,COLLEGE]
counter = 0
print('Analysizing the area around the localities')
for lat,lon in zip(lats,lons):
counter += 1
profile = get_location_profile(lat,lon,500,100,cats)
location_profile.append(profile)
v_gyms = get_local_feature(lat,lon,GYM,550)
local_gyms = []
for i in range(0,len(v_gyms)):
gym_name = v_gyms['name'][i]
gym_vid = v_gyms['id'][i]
gym_cat = v_gyms['categories'][i]
gym_lat = v_gyms['lat'][i]
gym_lon = v_gyms['lng'][i]
gym_dist = v_gyms['distance'][i]
x,y = lonlat_to_xy(gym_lon, gym_lat)
gym_d = (gym_vid,gym_name,gym_cat,gym_lat,gym_lon,gym_dist,x,y)
if gym_dist <= 500:
local_gyms.append(gym_d)
gyms[gym_vid] = gym_d
location_gyms.append(local_gyms)
v_yoga = get_local_feature(lat,lon,YOGA,550)
local_ygc = []
for j in range(0,len(v_yoga)):
yg_vid = v_yoga['id'][j]
yg_name = v_yoga['name'][j]
yg_cat = v_yoga['categories'][j]
yg_lat = v_yoga['lat'][j]
yg_lon = v_yoga['lng'][j]
yg_dist = v_yoga['distance'][j]
x,y = lonlat_to_xy(yg_lon, yg_lat)
yog_d = (yg_vid,yg_name,yg_cat,yg_lat,yg_lon,yg_dist,x,y)
if yg_dist <= 500:
local_ygc.append(yog_d)
yoga_centers[yg_vid] = yog_d
location_yoga.append(local_ygc)
if counter%100 == 0:
print ('Analyzed {} locations'.format(counter))
return gyms,location_gyms,yoga_centers,location_yoga,location_profile
gyms,location_gyms,yoga_centers,location_yoga,location_profile = compute_feature(latitudes,longitudes)
print('Total number of Gyms:', len(gyms))
print('Total number of Yoga Centers:', len(yoga_centers))
local_gym_count = [len(res) for res in location_gyms]
local_ygc_count = [len(yg) for yg in location_yoga ]
#print(len(local_gym_count))
blr_locations['Gyms'] = local_gym_count
blr_locations['YogaCenters'] = local_ygc_count
blr_locations.head()
print (len(location_profile))
#lets add the amenities count each location
ofc_cnt = []
hm_cnt =[]
shp_cnt = []
col_cnt = []
for i in range (0,len(location_profile)):
ofc_cnt.append(len(location_profile[i][OFFICE]))
hm_cnt.append( len(location_profile[i][HOME]))
shp_cnt.append(len(location_profile[i][COMMERCE]))
col_cnt.append(len(location_profile[i][COLLEGE]))
blr_locations['Offices'] = ofc_cnt
blr_locations['Residential'] = hm_cnt
blr_locations['Commercial'] = shp_cnt
blr_locations['Colleges'] = col_cnt
blr_locations.head()
print (location_profile[0][COLLEGE]['lng'])
categories_ = [OFFICE,HOME,COMMERCE,COLLEGE]
map_bangalore = folium.Map(location=blr_center, zoom_start=11)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
for gym in gyms.values():
lat = gym[3]
lon = gym[4]
folium.CircleMarker([lat, lon], radius=3, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
for yog in yoga_centers.values():
lat = yog[3]
lon = yog[4]
folium.CircleMarker([lat, lon], radius=3, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)
#folium.GeoJson(blr_local,style_function=bbmp,name='geojson').add_to(map_bangalore)
map_bangalore
The above map shows the distribution of Gyms and Yoga centers in Bangalore
for cat in categories_:
for lat,lon in zip(blr_profile[cat]['lat'],blr_profile[cat]['lng']):
folium.CircleMarker([lat, lon], radius=3, color='green', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)
#marker_cluster = folium.MarkerCluster().add_to(map_bangalore)
for i in blr_locations.index:
for cat in categories_:
if len(location_profile[i][cat]) > 0:
for n in range (0,len(location_profile[i][cat])):
la = location_profile[i][cat]['lat'][n]
lo = location_profile[i][cat]['lng'][n]
folium.CircleMarker([la,lo ], radius=3, color='green', fill=True,fill_color='red', fill_opacity=1).add_to(map_bangalore)
map_bangalore
The above map shows the ditribution of Various Amenities
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
for i in blr_locations.index:
clr = 'red'
if blr_locations['Gyms'][i] == 0 and blr_locations['YogaCenters'][i] == 0:
clr = 'yellow'
if blr_locations['Offices'][i] > 0 or blr_locations['Residential'][i] > 0 or blr_locations['Colleges'][i] or blr_locations['Commercial'][i]:
clr = 'green'
folium.Circle([blr_locations['Latitude'][i], blr_locations['Longitude'][i]],radius=500,color=clr,fill_color=clr, fill=True, fill_opacity=0.3).add_to(map_bangalore)
for cat in categories_:
for lat,lon in zip(blr_profile[cat]['lat'],blr_profile[cat]['lng']):
folium.CircleMarker([lat, lon], radius=3, color='green', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)
map_bangalore
So now we have a view of all Gyms and Yoga Centers in the City. Also we have an overall profile of locations which have the amenities of our choice.
The above folium map denotes areas in red which already have a gym or a yoga center. The areas marked in green dont have either but have at least one of the amenities of our choice, thus would be investigated further The areas marked in yellow are ones without any fitness center but dont have any amenities either. So could be ignored. But we still investigate firther as some some pockets of these zones may still be in close vicinity of an amentity.
This concludes the data gathering phase and we shall use the above to
In this project we will direct our efforts on detecting areas of Bangalore do not have a Gym or Yoga Center. Also we would try to look for locations which have offices, colleges , residential areas and commercial centers either in the location or in close vicinity.
First we will identify areas lacking a Gym or Yoga Centers but having the aforesaid amentities. These locations could be taken to consideration right away.
Next are the location which do not have a Gym or Yoga Center but do not have the aforesaid amenities. But such location having an amenity center with in 4 KM of its center could be considered as potential location.
.
create_download_link(blr_locations, filename="blr_locations.csv")
blr_locations.head()
Lets now refine our raw data to gain some further insights. While we know there are many locations which do not have a Gym or Yoga center. So first lets compute distances of nearest Yoga Center for each location . Likewise we shall compute nearest Gyms and amentities for each location. This will help up further narrow down the areas of interest .
#blr_locations.drop('Distance to Italian restaurant',1,inplace = True)
distances_to_yoga_center = []
for area_x, area_y in zip(xs, ys):
min_distance = 20000
for res in yoga_centers.values():
res_x = res[6]
res_y = res[7]
d = calc_xy_distance(area_x, area_y, res_x, res_y)
if d<min_distance:
min_distance = d
distances_to_yoga_center.append(min_distance)
blr_locations['Distance to Yoga Center'] = distances_to_yoga_center
blr_locations.head()
print('Average distance to closest Yoga Center from each area center:', blr_locations['Distance to Yoga Center'].mean())
distances_to_gym = []
for area_x, area_y in zip(xs, ys):
min_distance = 20000
for res in gyms.values():
res_x = res[6]
res_y = res[7]
d = calc_xy_distance(area_x, area_y, res_x, res_y)
if d<min_distance:
min_distance = d
distances_to_gym.append(min_distance)
blr_locations['Distance to Gym'] = distances_to_gym
blr_locations.head()
print('Average distance to closest Yoga Center from each area center:', blr_locations['Distance to Gym'].mean())
So there is a Gym every 2.1 KM and Yoga Center every 4.5 KM
len(location_profile)
categories = [OFFICE,HOME,COMMERCE,COLLEGE]
distance_to_nearest_amenity = []
for area_x, area_y in zip(xs, ys):
min_distance = 20000
for i in blr_locations.index:
for cat in categories:
if len(location_profile[i][cat]) > 0:
ax = blr_locations['X'][i]
ay = blr_locations['Y'][i]
d = calc_xy_distance(area_x, area_y, ax, ay)
if d < min_distance:
min_distance = d
distance_to_nearest_amenity.append(min_distance)
blr_locations['Distance to Amenity'] = distance_to_nearest_amenity
print('Average distance to closest Amenity from each area center:', blr_locations['Distance to Amenity'].mean())
blr_locations.head()
create_download_link(blr_locations,title = "Download CSV file", filename = "blr_locations.csv")
blr_locations.head()
blr_locations = df_data_1
blr_locations.head()
Lets try to filter out locations that have at least one amenties , But dont have a Gym or Yoga Center.
!pip install -U pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
blr_locations.head()
areas_of_interest = pd.DataFrame()
areas_of_interest = pysqldf("SELECT * FROM blr_locations WHERE YogaCenters = 0 and Gyms = 0 and (Offices > 0 or Residential > 0 or Commercial > 0 or Colleges > 0);")
print('Total number of Primary Areas:',len(areas_of_interest))
areas_of_interest.head()
secondary_areas = pysqldf("select * from blr_locations where YogaCenters = 0 and Gyms = 0 and Offices = 0 and Residential = 0 and Commercial = 0 and Colleges = 0 and [Distance to Amenity] > 0 and [Distance to Amenity] < 4000 and [Distance to Yoga Center] > 5000")
#pysqldf("SELECT * FROM blr_locations WHERE [Gyms in the Area]> 1 ").head()
print('Total Number of Secondary Area of Interest : ',len(secondary_areas))
secondary_areas.head()
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
for lat, lon in zip(areas_of_interest['Latitude'], areas_of_interest['Longitude']):
#folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
folium.Circle([lat, lon],radius=500,color='blue', fill=False).add_to(map_bangalore)
#folium.Marker([lat, lon]).add_to(map_bangalore)
for lat, lon in zip(secondary_areas['Latitude'], secondary_areas['Longitude']):
#folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
folium.Circle([lat, lon],radius=500,color='yellow', fill=False).add_to(map_bangalore)
#folium.Marker([lat, lon]).add_to(map_bangalore)
map_bangalore
from sklearn.cluster import KMeans
number_of_clusters = 15
good_xys = areas_of_interest[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)
cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]
map_bangalore = folium.Map(location=blr_center, zoom_start=11)
#folium.TileLayer('cartodbpositron').add_to(map_bangalore)
#HeatMap(ygc_latlons).add_to(map_bangalore)
folium.Circle(blr_center, radius=18000, color='blue', fill=False, fill_opacity=0.4).add_to(map_bangalore)
folium.Marker(blr_center).add_to(map_bangalore)
for lon, lat in cluster_centers:
folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_bangalore)
folium.Marker([lat, lon]).add_to(map_bangalore)
for lat, lon in zip(areas_of_interest['Latitude'], areas_of_interest['Longitude']):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bangalore)
map_bangalore
blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1],blr_center[0])
candidate_area_addresses = []
print('==============================================================')
print('Addresses of Primary Locations recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
addr = get_locality_address(GEO_KEY, lat, lon).replace(', India', '')
candidate_area_addresses.append(addr)
x, y = lonlat_to_xy(lon, lat)
d = calc_xy_distance(x, y, blr_center_x, blr_center_y)
print('{}{} => {:.1f}km from Bangalore Center'.format(addr, ' '*(50-len(addr)), d/1000))
number_of_clusters = 15
ok_xys = secondary_areas[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(ok_xys)
cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]
map_bangalore = folium.Map(location=blr_center, zoom_start=12)
#folium.TileLayer('cartodbpositron').add_to(map_bangalore)
#HeatMap(ygc_latlons).add_to(map_bangalore)
folium.Circle(blr_center, radius=18000, fill=False, fill_opacity=0.4).add_to(map_bangalore)
folium.Marker(blr_center).add_to(map_bangalore)
for lon, lat in cluster_centers:
folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_bangalore)
folium.Marker([lat, lon]).add_to(map_bangalore)
for lat, lon in zip(secondary_areas['Latitude'], secondary_areas['Longitude']):
folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bangalore)
map_bangalore
candidate_area_addresses = []
print('==============================================================')
print('Addresses of Secondary Locations recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
addr = get_locality_address(GEO_KEY, lat, lon).replace(', India', '')
candidate_area_addresses.append(addr)
x, y = lonlat_to_xy(lon, lat)
d = calc_xy_distance(x, y, blr_center_x, blr_center_y)
print('{}{} => {:.1f}km from Bangalore Center'.format(addr, ' '*(50-len(addr)), d/1000))
This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with no Gyms or Yoga Centers but still close to residential/office/commercial locations. Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential location for fitness centers.
Our analysis shows that although for a city as populous as bangalore is catered by few hundered Gyms and even fewer Yoga Centers. Thus there is a great potential for new Yoga Center or even a multi speciality fitness center.
Intially we divided the entire area of Bangaore into 500m radius circular cells and analyzed each location . This helped us to arrive with data which showed which locations have fitness centers and which absolutely lack any .
We tried assessing the feasibilty of each location understading the location profile. If a location has offices, homes, commercial centers or college, it indicates primarily that the cell is habitated some economic activity can inferred. Thus such locations without a fitness center should be oour primary area of interest. Now there could be areas which are remote but still with in managebale distance from community center. Such locations are the secondary set of locations that could be considered.
We then used the KMeans Clustering to arrive at 15 potential primary locations and 15 potential secondary locations
Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.
Purpose of the project was to indentify potential locations for starting a new Yoga Center or Fitness Center in Bangalore. Using the Foursquare data we could gain insight about every location, the exact count of gyms or yoga centers , besides we could get an overview of the location profile which helped us to narrow down the potential locations
We then clustered those locations to identify general zones which could used as starting point for further explorations
Final descision on optimal fitness center/ Yoga Center will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location real estate availability, prices, social and economic dynamics of every neighborhood etc.